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Abstract 

A full structural description of transition state ensembles in protein folding includes the specificity 
of the ordered residues composing the folding nucleus as well as spatial density. To our knowledge, 
the spatial properties of the folding nucleus and interface of specific nuclei has yet to receive 
significant attention. We analyzed folding routes predicted by a variational model in terms of a 
generalized formalism of the capillarity scaling theory that assumes the volume of the folded core 
of the nucleus grows with chainlength as Vf ~ A^^*^. For 28 two-state proteins studied, the scaling 
exponent i' ranges from 0.2 to 0.45 with an average of 0.33. This average value corresponds to 
packing of rigid objects, though generally the effective monomer size in the folded core is larger 
than the corresponding volume per particle in the native state ensemble. That is, on average the 
folded core of the nucleus is found to be relatively diff'use. We also studied the growth of the 
folding nucleus and interface along the folding route in terms of the density or packing fraction. 
The evolution of the folded core and interface regions can be classified into three patterns of growth 
depending on how the growth of the folded core is balanced by changes in density of the interface. 
Finally, we quantify the diffuse versus polarized structure of the critical nucleus through direct 
calculation of the packing fraction of the folded core and interface regions. Our results support the 
general picture of describing protein folding as the capillarity-like growth of folding nuclei. 
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The modem theory of protein folding describes the mechanism for folding as an entropic 
bottleneck arising from the decreasing number of accessible pathways available to a protein 
as it becomes ordered.-i^ The collection of partially ordered conformations corresponding to 
this bottleneck region is known as the transition state ensemble or critical folding nucleus.- 
Although it is common to focus on the degree of native-like order of specific residues, a 
complete description of the protein folding mechanism also includes the spatial properties 
such as size or density of the transition state ensemble. Indeed, shortly after characterizing 
the transition state ensemble of CI2, Fersht summarized the structure of the critical nucleus 
by a spatial description through the proposal of the nucleation- condensation mechanism.- 
This critical nucleus can be thought of as an expanded, partially ordered version of the 
native state ensemble with concomitant longranged tertiary and local secondary structure. 
It is now clear that while diffuse nuclei appear to be the general rule, some nuclei are less 
diffuse than others.- Polarized nuclei have highly structured residues which are spatially 
clustered in the native structure, while the rest of the residues show little definite order. 

Such nuclei are similar to the capillarity approximation in homogeneous nucleation 
in which the free energy of a stable phase droplet is separated from the metastable phase 
by a very sharp interface.— Exploiting this analogy, Wolynes describes a nucleus with 
capillarity-like order in which the interface surrounding a relatively folded core is broadened 
by wetting of partially ordered residues.— In this picture, folding can be described as the 
growth of the folding nucleus: a wave of order moving across the protein as the edge of the 
nucleus expands to ultimately consume the entire molecule.—*^ 

The extended partially ordered interface of a capillarity-like ordered nucleus separates 
space into three regions: a folded core, a partially ordered interface region, and unfolded 
halo (see Fig. [1]). In this paper, we monitor the structural development of the nucleus along 
the folding route through the evolution of the packing fraction of the folded core and the 
interface. As shown in Fig. [1], growth of the nucleus can be described by fluxes of residues 
passing through two moving surfaces: one surface separates the folded core and interface, 
and the other surface separates the interface region and the unfolded halo. As the protein 
folds, the evolution of the interface is determined by the interfacial volume and the net flux 
of residues entering the interface. 

Our analysis is based on folding routes calculated for 28 two-state proteins from a coop- 
erative variational model described in^^. We note this model includes neutral cooperativity 
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due to repulsive excluded volume interactions. This form of cooperativity has been shown 
to broaden the range of barrier heights allowing direct comparison between calculated and 
measured folding rates.— Not surprisingly, cooperativity tends to sharpen the interface be- 
tween folded and unfolded regions. Nevertheless, the interface from this model is generally 
not nearly as sharp as a strict capillarity description in which a residue can be clearly identi- 
fied as being either completely folded or completely unfolded as some other analytic models 
assume.— li^ii^ In fact, an unbiased analysis of the spatial properties of the folding nucleus 
fundamentally depends the model's ability to describe partial order. 

Capillarity-like growth of folding nucleus 

Capillarity picture of folding nuclei. The capillarity approximation of folding nuclei 
is based on classical nucleation theory of first order phase transition kinetics.— i^^. Within 
the capillarity approximation, the free energy of a nucleus with volume Vf and surface area 
A{ can be written as a sum of two terms 

F = -AfVi + -fA, (1) 

where A/ denotes the bulk free energy difference per unit volume between the unfolded and 
folded ensembles, and 7 is the surface tension between the folded and unfolded regions. 

A folded core with native-like density has a volume per monomer independent of its size. 
Relaxing this assumption, we assume that the number of residues in the folded core, iVf, 
scales with and its volume, Vi, according to 

V{ = b^Nfr (2) 

Here, u is the scaling exponent associated with the lengthscale of the folded core R ~ b^Nf, 
and is a geometry dependent elementary volume proportional to the monomer volume, 
6q. The free energy of a folded nucleus with iVf residues then has the form:— 

F{Ni) = -Afb^Nf" + jb^N^", (3) 

At the folding transition temperature, Tf , finite size depression of the surface energy suggests 
that 7 ~ AfbN" where N is the number of monomers in the protein. The maximum of 
the free energy occurs giving the size of the critical nucleus, and the 

3 



associated free energy barrier scales as AF^ ~ N'^^. If we assume that the folded core 
has native like packing, z/ = 1/3 and is the native-like volume per monomer, so that 
A^; = (2/3)3 AT and AFt ~ Ar2/3.io^ 

Simulations and alternative theoretical considerations also suggest that barrier height 
(logarithm of the folding time) scales sublinearly on chainlength, AF"!" ~ with 
< p < 1 Direct analysis of folding rate data to determine the scaling exponent p 

encounters the difficulty that the range of is too small to distinguish between different 
values of p.^i2ii2^i2^i2i. So while it may be reasonable to expect that the scaling of the 
barrier height with chainlength is universal for sufficiently large proteins, the size of a 
typical two state proteins (~ 100 amino acids) may well be too small to be governed by 
this generic behavior. In this case, both specificity and size of these smaller proteins should 
generally determine the properties of the critical nuclei. In this paper, we assume that 
Eq. [2] is valid to describe the growth of the nucleus in all the two state proteins, but the 
exponent u and volume are allowed to be protein specific. 

Characterizing the folded core and the interface. In the variational model considered 
in this paper, partially ordered configurations are described by a variational Hamiltonian, 
7^0 7 corresponding to a stiff polymer chain inhomogeneously constrained to the native struc- 
ture. Since this model is described in detail in Ref.— , we focus here on how to define 
folded core, interface, and unfolded regions along the calculated folding route. This is not 
as straight-forward as one might expect because the concept directly couples specificity of 
the nucleus with the spatial density. 

We characterize the degree of structure of each residue by the extent of localization about 
the native structure {r'^}, pi = (exp(— a;'^(rj — rf )^)o, with = 0.1. Here, the subscript 
denotes the average with respect to the Boltzmann weight with Tio- Denoting the native 
density at the globule and native state by Pi(G) and pj(N), respectively, we consider the 
normalized density 

^ Pi- pi{G) , . 

pm-p^iG) 

as a set of order parameters characterizing the folding of each residue. Progress along the 
folding route can be monitored by the global structural parameter Q = l/Nj^Pi- 

The normalized native densities are used to define a fiducial set of folded residues, {F}, 
with Pi > 0.6, as shown in Fig. [2^. Next, we define the spatial region of the folded core 

4 



through the relative contribution of the density of the folded residues in {F}, nf(r) = 
^|p|((5(r — rj))o, to the total density, n{r) = J^iLii^i'^ ~ ^i))o- The spatial extent of the 
folded core and interfacial regions in this analysis is determined by an indicator function 

n r = — -. (5) 

n{rj 

which ranges from < n(r) < 1. We define the folded core region, Vf, as the points {rf} for 
which the density of the fiducial folded residues contributes at least 50% to the total density 
{n{r) > 0.5). The number of residues in this folded core region can be found by numerically 
integrating the density over the core region, N{ = J^^ n(r)dr. The volume of the core region 
is given by Vf = f^^ dr. 

Similarly, the interfacial region, Vint, is defined as the points {rint} for which 0.1 < 
^(rint) < 0.5. The number of interfacial residues and volume of the interface is given by 
A'int = L. n{r)dr, and Mnt = L dr, respectively. 

*^int l^int 

The number of residues and the volume can be used to define a mean packing fraction of 
the folded core and partially ordered interface by 

fJ'i = -ryVo and (x^nt = t^Vo, (6) 

respectively. Here, vq is the calculated volume of per particle of the native structure at the 
folding transition temperature, Tf. The growth of the nucleus can be characterized by the 
way the packing fractions /if and /xi change along the folding route. 

Growth of folding nucleus along the folding route. As illustrated in Figj2](a-b), the 
changes in A'f and Vf along a folding route can be fit to Eq. [2] to give an estimate of the 
scaling exponent u for each protein. Fig. [2]^c) shows the distribution of predicted u from 
the folding routes obtained from the variational model for 28 two-state proteins discussed 
in — . The predicted scaling exponent u ranges between 0.2 ~ 0.42 with an average of 
z/ = 0.33. The mean exponent is very close to the the scaling associated with close-packed 
rigid objects, u = 1/3. For comparison, recent detailed statistical models indicate that the 
scaling exponent for the unfolded state of a protein is about z/ = 0.59, while the folded 
state of a wide variety of proteins suggest that proteins with less than 300 amino acids have 
compact folded structures {u = 0.3), while larger proteins are less dense {u = 0.4).^- 
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The mean packing fraction of the core scales with the number of monomers as: 

= "^Nr^- (7) 

For the close packing value u = 1/3, /if is independent of the number of monomers and the 
core retains native like density as it grows. When u > 1/3, the core becomes less compact as 
monomers are added to the core. This is the familiar scaling from loosely packed or fractal 
objects. When u < 1/3, the core density increases as more monomers are incorporated into 
the core. This can be understood as the consolidation of structure in the folded core as 
folding progresses. 

Although the spatial structure of the critical folding nucleus (transition state ensemble) 
is discussed in more detail later, it is instructive to consider the value of the mean packing 
fraction of the core here. Fig. [2li shows the distribution of packing fractions of the folded 
core evaluated at the maximum free energy barrier between folded and unfolded states at 
Tf. The packing fraction has a wide range from 0.2 to 1.0. While some of the transition 
state nuclei have compact cores, the average packing fraction is only 0.59. This means that 
although the growth of a typical folded core corresponds to rigidly packed objects, a typical 
transition state ensemble has a folded core with twice the volume as the volume of same 
number of monomers in the native state conformation {b^ ^ 2fo). That is, the monomers 
composing the nucleus are typically much less localized than in the native state. 

Fig. [3] illustrates a typical example of the growth of the folded core and the interface 
region. Early in the folding, we see a small compact nucleus surrounded by a partially 
folded interface. This small nucleus is partially ordered, occupying about twice the volume 
as the corresponding residues in the native state. Structural fluctuations giving nuclei 
corresponding to Q < Q"^ are unstable with respect to the unfolded state due to relatively 
large surface free energy cost associated with small nuclei, whereas structural fluctuations 
with Q > Q'' will tend to evolve to the folded state. As the nucleus grows, the volume 
of the nucleus evolves as interfacial regions are incorporated into the core, while unfolded 
regions become part of the partially ordered interface. 



Growth patterns of the nucleus: The structural growth of the folding nucleus can be 
understood as the competition between growth of the folded core and the evolution of the 
interface. The flux of residues entering core through the interface region controls the growth 
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of the core, while the net flux of residues entering interface region from the unfolded halo 
controls the growth of the interface (see Fig. [1]). We characterize the evolution of the 
structure of the folding by the changes in the packing fraction of the core and interface 
regions as a function of Q. That is, we consider the signs of 



to identify different modes of growth. From the two state proteins used in this study, we 
can identify three distinct scenarios as illustrated in Fig. ID 

• Pattern A (consolidation of core and interface). As shown in Fig. H^, the density of 
both the core and interface increase along the folding route (/tint > and /tint > 0). 
The size of the core increases with Q, but Vf increases more slowly than A^f (see Eq. [7] 
with u < 1/3). Similarly, Ajnt and V^nt both increase with with Q throughout much of 
the growth. At larger Q, Vint reaches a maximum and subsequently decreases rapidly 
as interfacial residues are consumed by the folded core. 

• Pattern B (core consolidation dominated growth). As shown in Fig. Hb, the growth of 
the core is similar to Pattern A (/tint > 0), while the density of the interface decreases 
(Aint < 0). The difference between pattern A and the pattern B growth is that in 
pattern A the core and interface expand together relatively rapidly, while in pattern 
B the core grows at the expense of the interface. 

• Pattern C (balanced growth). As shown in Fig. Hb, the packing fraction of both the in- 
terface and core are roughly constant through much of the folding in Pattern C growth 
(Aint ~ and /tint ~ 0). Here, as the nucleus grows, the interfacial residues incorpo- 
rated into the folded core are balanced by unfolded residues entering the interfacial 
region. 

The growth mode of the nucleus for the 27 proteins considered in this paper (lpgbl6 is 
too small to have a compact folded core) can be roughly classifled as follows: Pattern A 
(Ipgb, laOn, 2ptl, Ishg, Ipsf, Ipks, Ipin, lc8c, Ifkb, IfnfQ, Iwit, lurn); Pattern B (2pdd, 
lenh, Icoa, Ivii, laps, limq, 2abd, lhdn,l div); Pattern C (limb, Icsp, Isrl, Iten, lo6x, 
Imef). 



Af(Q) 



d/if 
dQ 



and fiintiQ) 



d/iint 

dQ 



(8) 



7 



Polarized vs diffuse critical nucleus. 

A folding mechanism is typically characterized by the structure of the critical nucleus. 
The spatial structure of the transition state ensemble, inferred from 0-value analysis, has 
often been qualitatively summarized as either diffuse or polarized.— Intermediate 0-values 
spread across a large portion of the protein sequence indicate a diffuse nucleus. In contrast, 
polarized transition states are inferred when only one part of structure has relatively high 
0- vales while the rest of the residues have low 0- values. In addition to a bimodal distribution 
of 0-values, the ordered residues in a polarized transition state ensemble are located in one 
region in the native configuration. Polarized and diffuse critical nuclei are sometimes called 
localized and delocalized transition state ensembles, respectively.— Of course, the critical 
nucleus of a given protein is expected to have structural properties somewhere between the 
two limits of polarized and diffuse. The second row of Fig. [3] gives an example a diffuse 
critical nucleus (limb). For comparison. Fig. [5] shows the corresponding plots for a protein 
with a polarized critical nucleus (Isrl). Comparing with Fig. [3] and Fig. [5l it is clear that 
the interface of limb is much broader than the interface region of Isrl. Furthermore, the 
folded core of limb is much more diffuse than the folded core of Isrl. 

Characterizing a capillarity-like ordered nucleus as either diffuse or polarized is a state- 
ment of the sharpness of interface as well as compactness of the core. For convenience, we 
monitor both regions by the normalized volume per monomer (inverse packing fraction): 
l//if and l//iint- The results for the two state proteins considered in this work are shown 
in Fig. O Nuclei with small values of 1/ and 1 / /iint are more polarized with relatively 
compact cores and sharp interfaces (similar to those envisioned in the strict capillarity ap- 
proximation). Diffuse nuclei, on the other hand, have extended regions of partial order which 
corresponds to large values of l//if and/or l//iint- We note that relatively polarized nuclei 
can have cores that are still loosely packed compared to the native state density (eg., Ipgb). 
Furthermore, relatively diffuse nuclei can have tightly packed cores but extended interfaces 
(eg., 2abd, limq, Ifkb). 

Our predictions for proteins with polarized transition state, such as Icsp^, Isrl^, Ishg-, 
Ipin^ 2ptl2^, and Ipgb^^, are consistent with classification inferred by experimental 0— value 
analysis. Several protein classified as having diffuse nuclei are also predicted by our model, 
such as llmb^^,2abd^, limq^,and Ifkb^. Nevertheless, the predictions are at odds with 
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experimental measurement for a few proteins. Our model predicts that the folding nucleus 
of CI2, perhaps the archetype for a diffuse transition state ensemble, is relatively polarized. 
This is also true for laps.'^^ Another exception is UlA, which has been shown experimen- 
tally to have an early, delocalized transition state. The calculated folding route from this 
cooperative model several transition states along the folding path, but the highest barrier 
corresponds to a late, polarized nucleus. If we look at the earlier transition state ensembles 
(also shown in Fig. [6]) the structure of the nucleus in much more diffuse. The same situation 
arises in Ipgb,-^ for which the calculated folding route has two transition states; the early 
one which is more diffuse has a lower free energy than the late one which is more polarized. 

We note that in these exceptional cases, 0- value distributions indicate the critical nucleus 
is rather diffuse while our model predict more polarized nuclei. This tendancy can be 
understood as a consequence of the model being overly cooperative for these proteins, since 
cooperativity generically tends to sharpen the interface between folded region and unfolded 
region, and hence is somewhat biased towards polarized transition states. 

Conclusion 

In this paper, we directly characterize folding in terms of the capillarity-like growth of the 
folding nucleus. The nature of the partially folded interfacial region between the folded core 
and unfolded halo is the central focus of characterizing the growth modes of the nucleus. 
We find that the growth of the nucleus can be classified into three different patterns: (A) 
the core and interface both condense along the folding route; (B) the core condenses at the 
expense of the interfacial region; and (C) the growth of the core is balanced by the monomers 
entering the interfacial region from the unfolded halo. The picture of the core as close 
packing of rigid monomers appears to be valid on average, though the size of the effective 
monomers is larger than one would expect for a native-like, compact core. Furthermore, 
this analysis clarifies that diffuse nuclei inferred by the distribution of intermediate 0-values 
for example can arise from either a diffuse folded core, a broad interfacial regions, or both. 
The predictions from our calculations can be tested from the analysis of the evolution of 
0- values as a function of the movement of the transition state ensemble pioneered by 
Oliveberg and co-workers.—"^ 

The variational model considered here includes a uniform "neutral", excluded volume 
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type cooperatively developed to account of general trends in the absolute folding rates of 
two state proteins.— The exceptional qualitative discrepancies of the the polarized versus 
diffuse characterization of the critical nucleus (such as CI2, laps, Ipgb, and UlA) permit an 
opportunity to assess the form and strength of the cooperativity of this model. The spatial 
density of the critical nucleus can be used as an independent criteria to check the value 
of the cooperativity obtained by simultaneously fitting 0-values and barrier height by the 
parameterization of the cooperativity for each protein. There are some indications that one 
should consider variations in the strength of the cooperativity for different proteins (though, 
admittedly this is very closely tied to the specific form of the cooperativity in the model). 
For example, Ejtehadi and Plotkin recently found that the strength of cooperativity from 
three-body interactions can be tuned for each protein to bring simulations of 0-values into 
better agreement with experimental measurement.— Furthermore, detailed analysis from 
a similar variational model suggests that the cooperativity of the UlA protein is much 
lower than assumed in this model.— The generally good qualitative agreement between our 
calculations and experimental inferences about the spatial extent of folding nuclei suggest 
that tuning the excluded volume strength for each protein would not greatly improve the 
results presented here for the majority of the proteins studied. 
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Figures 




FIG. 1: Illustration of folding nucleus: folded core, interfacial region, and unfolded halo. Growth 
of the nucleus can be characterized by fluxes entering the folded core and interfacial regions. 
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FIG. 2: Scaling of the folded core with number of monomers, (a-b) correspond to A— repressor 
(limb), (a) Residues with native density pi > 0.6 (indicated by the dashed line) define a fiducial set 
of folded residues, (b) Linear fit of log V{ vs. log A'^f (dashed line) gives the exponent of Vf ~ ^f'^- 
In this example, the fitting equation is y = 5.6 + 0.97x, so that ly = 0.32 and = 5.6a'^, a = 3.8A 
is the average distance between the a carbons, (c) Histogram of scaling exponent i' for 28 proteins; 
(d) Histogram of the packing fraction of the folded core of the critical nucleus at Tf. 
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FIG. 3: Illustration of growth of folding nucleus and interface along the folding route (increasing 
Q) for the A-repressor protein (limb). Column 1 shows the three-dimensional folded structure with 
the fiducial set of folded residues colored blue and the unfolded residues colored red. In Column 2, 
the folded core (colored blue) is surrounded by the interfacial region (colored green). Column 3 is 
a projection of the indicator function n(r) that defines the folded and interfacial regions in space. 
The values correspond to maxzn{x, y, z), ranging from 1 (blue) to (red) in steps of 0.01. Contour 
lines correspond to 0.1, 0.5, 0.7. Column 4 gives the corresponding Q value for each row. The 
critical nucleus corresponds to Q = 0.53. The units for three plots are in Angstroms. This protein 
is belong to the Pattern C (balanced growth). The three-dimensional structure was produced by 
VMD^ 
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FIG. 4: Examples of three modes of growth of the nucleus. Pattern A-C corresponds to (a~c), 
respectively. Solid line corresponds to the mean packing fraction of the folded core, //f , while the 
dashed line corresponds to the mean packing fraction of the interface, /iint- 




FIG. 5: An example of a polarized critical nucleus (Q=0.45) for Src-SH3 (Isrl). Plots (a-c) 
correspond to the middle row of the diffuse nucleus shown in Fig. [3l 
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FIG. 6: Inverse packing fraction of the interface and folded core, for 27 two-state proteins. The 
gradual change color shows the continuous change from polarized nuclei (red) to diffuse nuclei 
(cyan). Also shown are two early transition states of UlA (lurn) and a late transition of protein 
G (Ipgb). 
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